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r Abstract 

o 

^^ Analyzing the computational complexity of evolutionary algorithms 

,— H (EAs) for binary search spaces has significantly informed our understand- 

H" ing of EAs in general. With this paper, we start the computational com- 

. !^ plexity analysis of genetic programming (GP). We set up several simplified 

S^ GP algorithms and analyze them on two separable model problems, OR- 

\^ DER and MAJORITY, each of which captures a relevant facet of typical 

^ GP problems. Both analyses give first rigorous insights into aspects of 

GP design, highlighting in particular the impact of accepting or rejecting 
neutral moves and the importance of a local mutation operator. 



1 Introduction 

Because of the complexity of genetic programming (GP) variants and the chal- 
lenging nature of the problems they address, it is arguably impossible in most 
cases to make formal guarantees about the number of fitness evaluations needed 
for an algorithm to find an optimal solution. Current theoretical approaches 
investigate foundational aspects of GP tangential to this goal, such as schema 
theories, search spaces, bloat and problem difficulty |13| . However, in this work, 
we instead choose to follow the path taken for evolutionary algorithms work- 
ing on fixed-length binary strings. Initial work on pseudo-Boolean functions 
illustrated the working principles of simple evolutionary algorithms (see e. g. 
[3 [ini H] ) ; subsequently, results have been derived for a wide range of classical 
combinatorial optimization problems such as shortest paths, maximum match- 
ings or minimum spanning trees (see e.g. [3]). These studies have contributed 
substantially to our theoretical understanding of evolutionary algorithms for 
binary representations. Poll et al. J13j state, "we expect to see computational 
complexity techniques being used to model simpler GP systems, perhaps GP 
systems based on mutation and stochastic hill-climbing." This contribution is 
one fulfillment of this prediction: its goal is to show a GP variant that identifies 
optimal solutions in provably low numbers of fitness function evaluations for two 
much simplified, but still relevant, problems that exhibit a few simple aspects 
of program structure. 

The simple parameterized GP algorithm we analyze can succinctly be de- 
scribed as both a hill climber and a randomized algorithm. It has four paramet- 
ric instantiations we call (l-f-l) GP-single, (1-1-1) GP-multi, (1-1-1) GP*-single, 
and (l-fl) GP*-multi that differ in the acceptance criterion and the size of 
the mutation proposed. Initially, a solution is chosen at random. We produce 
by random mutation exactly one offspring of the current solution, and replace 
the current solution by this proposal as specified by the acceptance criterion. 
The algorithm iterates until it finds an optimal solution. This simple form of 
GP algorithm has historical precedent in very early comparisons between Koza- 
style genetic programming and GP stochastic iterated hill climbing [TTl [TOl [T2] , 
though it does not include a finite bound on fitness evaluations, random restarts 
or a limit on how many times mutation will be applied to the current solution. 
Another simplification of the algorithm is that it uses a genetic operator that 
is as similar to bit-wise mutation as possible. A single bit-wise mutation is 
the smallest step possible in an binary EA's search space. Our mutation op- 
erator makes the smallest alteration possible to the GP tree while respecting 
the key properties of the GP tree search space: variable length and hierarchical 
structure. 

The two model problems we select for our analysis are ORDER and MA- 
JORITY, defined exactly as in !3^. We have chosen ORDER and MAJORITY 
because they make complexity analysis tractable. They allow fitness function 
evaluation without explicitly executing the program defined by the GP tree. 
They are minimally sufficient to capture several key properties of GP, including 
the existence of multiple optimal solutions but they are not real world applica- 



tion problems. Neither are they ad-hoc toy problems intended to demonstrate 
GP's strength (such as Boolean multiplexer for classical GP [5] or lawnmower 
for GP with automatically defined functions [6]). Each problem has a simple 
relation to more realistic GP problems: ORDER requires correct ordering as 
in conditional programs and MAJORITY requires the correct set of solution 
components. 

We proceed as follows: in Section [2] we formally describe the GP variants 
and the two problems. This requires that we first describe program initialization 



from a primitive set (2.1 1 and our mutation operator which is called HVL- 
Mutate' ( 2.2 ). We then proceed in Sectionslslandlllwith our analyses of ORDER 
and MAJORITY in terms of the expected number of fitness evaluations until our 
algorithms have produced a globally optimal solution for the first time. This is 
called the expected optimization time of the algorithm. Our results are followed 
by a discussion in Section [5] and conclusions and future work in Section |5.5[ 

2 Definitions 

2.1 Program Initialization 

To use tree-based genetic programming, one must first choose a set of primi- 
tives A, which contains a set F of functions and a set L of terminals. Each 
primitive has explicitly defined semantics; for example, a primitive might rep- 
resent a Boolean condition, a branching statement such as an IF-THEN-ELSE 
conditional, the value bound to an input variable, or an arithmetic operation. 
Functions are parameterized. Terminals are either functions with no parame- 
ters, i.e. arity equal to zero, or input variables to the program. 

In our derivations, we assume that a GP program is initialized by its parse 
tree construction. In general, we start with a root node randomly drawn from 
A and recursively populate the parameters of each function in the tree with 
subsequent random samples from A, until the leaves of the tree are all terminals. 
Functions constitute the internal nodes of the parse tree, and terminals occupy 
the leaf nodes. The exact properties of the tree generated by this procedure 
will not figure into the analysis of the algorithm, so we do not discuss them in 
depth. 

2.2 HVL-Mutate' 

The HVL-Mutate' operator is an update of O'Reilly's HVL mutation operator 
([10 1 111 ] ) and motivated by minimality rather than inspired from a tree-edit dis- 
tance metric. HVL first selects a node at random in a copy of the current parse 
tree. Let us term this the currentNode. It then, with equiprobability, applies 
one of three sub-operations: insertion, substitution, or deletion. Insertion takes 
place above currentNode: a randomly drawn function from F becomes the par- 
ent of currentNode and its additional parameters are set by drawing randomly 
from L. Substitution changes currentNode to a randomly drawn function of 



F with the same arity. Deletion replaces currentNode with its largest child 
subtree, which often admits large deletion sub-operations. 

The operator we consider here, HVL-Mutate', functions slightly differently, 
since we restrict it to operate on trees where all functions take two parameters. 
Rather than choosing a node followed by an operation, we first choose one of 
the three sub-operations to perform. The operations then proceed as shown in 
Figure [ll Insertion and substitution are exactly as in HVL; however, deletion 
only deletes a leaf and its parent to avoid the potentially macroscopic dele- 
tion change of HVL that is not in the spirit of bit-flip mutation. This change 
makes the algorithm more amenable to complexity analysis and specifies an 
operator that is only as general as our simplified problems require, contrasting 
with the generality of HVL, where all sub-operations handle primitives of any 
arity. Nevertheless, both operators respect the nature of GP's search among 
variable-length candidate solutions because each generates another candidate of 
potentially different size, structure, and composition. 

In our analysis on these particular problems, we make one further simpli- 
fication of HVL-Mutate': substitution only takes place at the leaves. This is 
because our two problems only have one generic "join" function specified, so 
performing a substitution anywhere above the leaves is a vacuous mutation. 
Such operations only constitute one-sixth of all operations, so this change has 
no impact on any of the runtime bounds we derive. 
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Figure 1: Example of the operators from HVL-Mutate'. 



2.3 Algorithms 

We define two genetic programming variants called (1-1-1) GP and (1-1-1) GP*. 
Both algorithms work with a population of size one and produce in each iteration 



one single offspring. (1+1) GP is defined in Algorithmflland accepts an offspring 
if it is as least as fit as its parent. 

Algorithm 1 ((1+1) GP). 

1. Choose an initial solution X . 

2. Set X' := X. 

3. Mutate X' by applying HVL-Mutate! k times. For each application, ran- 
domly choose to either substitute, insert, or delete. 

• // substitute, replace a randomly chosen leaf of X' with a new leaf 
u Cz L selected uniformly at random. 

• // insert, randomly choose a node v in X' and select u (z L uniformly 
at random. Replace v with a join node whose children are u and v, 
with the order of the children chosen randomly. 

• // delete, randomly choose a leaf node v of X' , with parent p and 
sibling u. Replace p with u and delete p and v. 

I Iff{X')>f{X), setX:^X'. 

5. Go to 2. 

(1+1) GP* differs from (1+1) GP by accepting only solution that are strict 
improvements (see Algorithmic]). 

Algorithm 2 (Acceptance for (1+1) GP*). 

r. Iff{X')>f{X), .setX:=X'. 

For each of (1+1) GP and (1+1) GP* we consider two further variants which 
differ in using one application of HVL-Mutate' ( "single" ) or in using more than 
one ("multi"). For (1+1) GP-single and (1+1) GP*-single, we set fc = 1, so that 
we perform one mutation at a time according to the HVL-Mutate' framework. 
For (1+1) GP-muhi and (1+1) GP*-muhi, we choose fc = 1 + Pois(l), so that 
the number of mutations at a time varies randomly according to the Poisson 
distribution. 

We will analyze these four algorithms in terms of the expected number of 
fitness evaluations to produce an optimal solution for the first time. This is 
called the expected optimization time of the algorithm. 

2.4 The ORDER problem 

We consider two separable problems called ORDER and MAJORITY that have 
an independent, additive fitness structure. They both admit multiple solutions 
on their objective function, which we feel is a key property of a model GP 
problem because it holds generally for all real GP problems. They also both use 
the same primitive set: 



• F := {J}, J has arity 2. 

Xi is the complenient of Xi. 

ORDER represents problems where the primitive sets include conditional 
functions, which gives rise to conditional execution paths. GP classification 
problems, for example, often employ a numerical comparison function (e.g. greater 
than X, less than X, or equal to X). This sort of function has two arguments 
(subtrees), one which will be executed only when the comparison returns true, 
the other only when it returns false [5]. Thus, a conditional function results 
in a conditional execution path, so the GP algorithm must identify and ap- 
propriately position the conditional functions to achieve the correct conditional 
execution behavior for all inputs. 

ORDER is an abstracted simplification of this challenge in that it determines 
the conditional path execution of a program by tree inspection rather than exe- 
cution. Instead of evaluating a condition test and then executing the appropriate 
condition body explicitly, an ORDER program's conditional execution path is 
determined by simply inspecting whether a primitive or its complement occurs 
first in an in-order leaf parse. Correct programs for the ORDER problem must 
express each positive primitive Xi before its corresponding complement Xi . This 
correctness requirement is intended to reflect a property commonly found in 
the GP solutions to problems where conditional functions are used: there exist 
multiple solutions, each with different conditional execution paths. 

Algorithm 3 {f{X) for ORDER). 

1. Derive conditional execution path P of X: 

Init: I an empty leaf list, P an empty conditional execution path 

1.1 Parse X inorder and insert each leaf at the rear of I as it is visited. 

1.2 Generate P by parsing I front to rear and adding ("expressing") a 
leaf to P only if it or its complement are not yet in P (i. e. have not 
yet been expressed) . 

2. f{X) = \{x,eP}\. 

For example, for a tree X, with (after the inorder parse) I = (xi,a;4,a;2,a;i,a;3,X6), 
P = (xi, X4, X2, x^, Xq) and f{X) — 3 because xi, xi^ x-^, G P . 

2.5 The MAJORITY problem 

MAJORITY is a GP equivalent of the GA OneMax problem [3]. MAJORITY 
reflects a general (and thus weak) property required of GP solutions: a solution 
must have correct functionality and no incorrect functionality. Like ORDER, 
MAJORITY is a simplification that uses tree inspection rather than program 
execution. A correct program in MAJORITY must exhibit at least as many 
occurrences of a primitive as of its complement and it must exhibit all the 



positive primitives of its terminal (leaf) set. Both the independent sub-solution 
fitness structure and inspection property of MAJORITY are necessary to make 
our analysis tractable. 

Algorithm 4 {f{X) for MAJORITY). 

1. Derive the combined execution statements S of X: 

Init: I an empty leaf list, S is an empty statement list. 

1.1 Parse X inorder and insert each leaf at the rear of I as it is visited. 

1.2 For i < n: if count(xi (z I) > count[xi G I) and count{xi ^ I) > \, 
add Xi to S 

2. f{X) = \S\. 

For example, for a tree X, with (after the inorder parse) I = (xi, S4, X2, xi, aJs, xq, xi, X4), 
S = (xi, 0:2,3:4) and f{X) = 3. 

3 Analysis for ORDER 

Here we present bounds for ORDER on the number of runtime evaluations 
needed in the execution of (1+1) GP and (1+1) GP*. 

We will analyze this GP problem using fitness-based partitions "2". This 
requires us to compute the probability of improving the fitness from fc to fc + 1 
for each value of k between and n — 1, inclusive. Although our HVL-Mutate' 
operator is complex, we can obtain a lower bound on the probability of making 
an improvement by considering fitness improvements that arise from insertions. 
This is described in the following lemma. 

Lemma 1. Define p^ to he the probability that we perform an insertion that 
improves the fitness value of the GP tree from k to k + 1. For the single- and 
multi- operation variants of (1+1) GP and (1+1) GP* applied to the ORDER 
problem, 

(Jn-kl_ 

\nmax|i , n\ ^ 

where n is the number of variables and T is the number of leaves in the GP tree 
at the particular iteration. 

Proof. When the fitness value is fc, it must be the case that fc different Xi appear 
before their corresponding Xi. To improve the fitness, we must insert one of the 
n — k unexpressed Xi as a leaf that will be visited before a leaf containing the 
corresponding Xi. Assume for notational ease that these unexpressed Xi are 
indexed by {xi, ...,Xn-k}- Define Ai to be the event that we insert Xi into 
the tree with our mutation operation, and define Bi to be the event that Xi is 
inserted before the corresponding Xi. Given this, we can write out pk as follows. 



Pfc = 5IP'-(A,)Pr(B,|^,) 



i=l 



With a single operation, the probabihty of choosing to insert a particular Xi is 
g^ , since we choose to insert with probability ^ and select the variable uniformly 
at random from the set of 2n possible terminals. We can cover the multi- 
operation case with this analysis as well because the number of operations is 
sampled according to 1 + Pois(l), so the probability of performing exactly one 
operation is K The probability of Ai is therefore at least g^, so in both the 
single- and multi-operation cases, we have 






i=l 



Wc need to analyze two cases in computing this sum. Preliminarily, we define 
S to be the total number of nodes in the GP tree. Note that S = 2T — 1, so 

s = eiT). 

Case 1: T > n ~ k. We first note that the probability of inserting Xi such 
that it is visited between the j — 1st leaf and the jth leaf in the traversal is at 
least ^ , since we choose to insert at the jth leaf with probability ^ and then 
add Xi as a left child of the new join node with probability ^ 

Inserting any of the n — fc unexpressed Xi before the first leaf in the tree 
clearly improves the fitness. If we insert at the second position instead, there 
must still be at least n — k — 1 choices of Xi that yield an improvement: there is 
only one node that will be traversed before this position in the tree, so there is 
at most one Xi expressed before this position. We can iterate this argument to 
see that at the ith position, there are still n — k — i + 1 Xi that can be inserted 
for an improvement to the fitness. By reindexing the Xi, we then have that Xi 
can be inserted in at least the first i positions in the tree. Using the fact that 
the number of leaves T is at least n — k, we have that 
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Noting that S = B(T), the asymptotic result follows. 

Case 2: T < n — k: We can apply the argument of Case 1 up to the Tth 
position. After this, we have that for n — A; — T -I- 1 of the unexpressed Xi, the 
corresponding Xi appears nowhere in the tree, so the probability of an insertion 
improving the fitness is 1. Wc also note that S < 2n in this case, allowing us to 



simplify our expression for pk as follows. 
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If T = 0(71— A:) , then we lower-bound pk using only the first term, which behaves 
asymptotically in this case as Q, I ^"~2 1 . Otherwise, if T = o{n — k), then we 

use the second term, which then grows according to O ( ^"~2 ) • In either case. 



because T is less than n, we have the desired asymptotic behavior. 



n 



With this lemma, we can now state the general theorem about the number 
of fitness evaluations needed for our (1+1) GP variants. 

Theorem 1. The expected optimization time of the single- and multi- operation 
cases of (1+1) GP and (1+1) GP* on ORDER is 0{nT^s,-^) in the worst case, 
where n is the number of Xi and Tmax denotes the maximal tree size at any stage 
during the evolution of the algorithm. 

Proof. We can apply Lemmalllto these algorithms, which implies an asymptotic 
lower bound on pk , the probability of improving the fitness from fc to fc + 1 via an 
insertion. This also serves as an asymptotic lower bound on the probability of 
improving the fitness at all, and therefore provides an expected time necessary 
to improve the fitness, regardless of whether or not we accept neutral moves. In 
order to determine the total number of evaluations, we must sum the expected 
number of fitness function evaluations over all intermediate fitness values, from 
A; = Otofc = n— 1. 

The expected optimization time is therefore upper bounded by 
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where the second equahty follows from the fact that T^ax > Tn > n, and the 
last equality follows from the fact that X]^i t^ ^ 2. D 

Note that most GP algorithms explicitly limit the maximum tree size that 
can be used in an algorithm. Choosing a linear maximum tree size that would 
still allow us to generate an optimal solution, i. e. a tree with at least n leaves, 
gives an algorithm that solves the ORDER problem in expected time O(n^). 
However, it is also sometimes possible to show that the tree docs not get too 
big during the optimization process. We examine this for (1+1) GP*-single and 
present an upper bound on the expected optimization time. 

Corollary 1. The expected optimization time of 

(1+1) GP*-single on ORDER is 0(v?) if the tree is initialized with 0{n) ter- 
minals. 

Proof. We note that the maximum value of the fitness is n, and the fitness is 
integer- valued, so if it is strictly increasing with each operation that is accepted, 
there must be no more than n operations accepted. In the single-operation 
framework, each operation adds at most two nodes to the tree (if it is an inser- 
tion), which means that Tmax < 0{n) + 2n — 0{n) holds during the run of the 
algorithm. D 

The case of (1+1) GP*-multi is more difficult to analyze because the ex- 
pected length of accepted moves may be very different from the expected length 
of proposed moves, as conditioning on accepting the move will skew the distri- 
bution. We conjecture that the bound from Corollary [T] holds in this case as 
well, but do not present a proof of this. 

We also note that because of how our fitness-based partition argument is 
structured, invoking the average case does not enable us to find any better 
bounds. Although k will initially be somewhat greater than zero, we will gen- 
erally still need to improve the fitness &{n) times, so we will have the same 
asymptotic result. 

4 Analysis for MAJORITY 

We next consider the MAJORITY problem. We start with some preliminary 
definitions. 

Definition 1. For a given GP tree, let c{xi) he the number of xi variables and 
c{xi) be the number of negated Xi variables present in the tree. For a GP tree 
representing a solution to the MAJORITY problem, we define the deficit in the 
ith variable by 

A: = c{xi) - c{xi). 

Definition 2. In a GP tree for MAJORITY, we say that Xi is expressed when 
Di < and c{xi) > 0. 
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The fitness of a tree T is simply tlie number of variables that are expressed. 
We note a property of HVL-Mutate' for this particular problem that we will 
make use of later. 

Definition 3. The substitution decomposability property (SDP) for MAJOR- 
ITY states that a substitution is exactly equivalent to a deletion followed by an 
insertion, which are accepted or rejected as a unit. 

This property follows from the fact that the order of the terminals has no 
bearing on the fitness of a solution for MAJORITY. The variable to be replaced 
by substitution is selected uniformly at random from the set of leaves of the 
tree. This is identical to how the variable to be deleted is chosen when using 
the deletion operator. Substitution then inserts a variable selected uniformly at 
random from the set of possible terminals, just as the insertion operator does. 



We begin our analysis, in 4.1 with worst case bounds for (1+1) GP-single, 
(1+1) GP*-single, and (1+1) GP*-multi. (1+1) GP-single solves the prob- 
lem quite efficiently, yielding polynomial-time worst-case complexity. However, 
not accepting neutral moves, as in (1+1) GP*, results in poor performance: 
(1+1) GP*-single fails to terminate in the worst case, and (1+1) GP*-multi 
requires a number of fitness evalutions exponential in the size of the initial tree. 

In |4.2| we derive average case bounds that assume the initial solution tree 
has 2n terminals each selected uniformly at random from L. This random tree 
initialization allows us to bound the maximum deficit in any variable. We show 
that (1+1) GP-single runs in time O (nTniaxloglog(n)) in the average case. By 
contrast, (1+1) GP*-single has a constant probability of failing to terminate, 
and so the expected runtime is infinite. 

4.1 Worst Case Bounds 

4.1.1 (1+1) GP-single 

We will show here some properties of (1+1) GP-single on MAJORITY and give 
a polynomial-time worst-case bound on the performance. Our analysis considers 
the evolution of the deficits Di over the course of the algorithm as n parallel 
random walks. We will show that each positive Di reaches zero at least as quickly 
as a balanced random walk, which is the condition for the corresponding xi to 
be expressed; this, then, gives us the expected number of operations that we 
are required to perform on a particular variable before it is expressed. Because 
these arguments do not easily extend to (1+1) GP-multi, we omit from this 
section any treatment of that case. 

Wc begin by establishing the validity of modeling the temporal sequence of 
each of the Di as a random walk. 

Lemma 2. For (1+1) GP-smgle on MAJORITY: 

a) The probability of proposing an operation that changes either the number 
of Xi or the number of Xi is il (^^^ . 

b) If some Xi has a deficit Di ^ d > 0, we require in expectation 0{dT-a_T_a.y^) 
proposed operations involving that variable before it is successfully expressed, 
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where T^ax is the maximum number of nodes in our GP tree at any timestep of 
the algorithm. 

Proof, a) To see that a particular operation involves Xi or Xi with probability 
fl (-), we simply note that the probability of inserting one of the two variables 

is i"xf = r!(i). 

3 2n V n / 

b) We address each of the three types of operations in turn and show that 
each is at least as favorable as a balanced random walk in terms of reducing Di 
to zero. 

Insertion: The probability of inserting Xi into the tree is g^, which is the 

same as the probability of inserting Xi. Therefore, given that we change Di with 

an insertion, we increase it or decrease it in a balanced manner, with probability 

1 

2- 

Deletion: The probability of a deletion changing D^ is 

c{x.i) + c{xi) 
T 

where T is the size of the GP tree. Given that we do such a deletion, we increase 
Di with probability ^(^'^)1'1^.) and decrease it with probability ^. y'^|- s , since 
we pick the variable to delete uniformly at random. However, note that because 
Di > 0, we have that c{xi) < c{xi), so the probability of decreasing D^ is greater 
than the probability of increasing it, so this is actually slightly better than a 
balanced random walk for the purpose of reducing Di. 

Substitution: We now make use of the substitution decomposability property 
(SDP) defined previously to observe that substitution consists of a deletion 
followed by an insertion. Therefore, a substitution is simply equivalent to taking 
one or two steps that tend to reduce Di with probability at least | if Di is greater 
than 0. 

Consider the 1-dimensional random walk on the integers 0, 1, . . . , n, with n 
being a reflecting barrier and being an absorbing barrier. The expected time 
to reach when starting at k is 0{kn), following the analysis for random walks 
on undirected graphs carried out in jJL, . This is precisely the setting we have for 
our random walk on the Di if we set k — d and n — Tmax, so we have that the 
random walk performed by the Di reaches zero after at most O(dTijiax) accepted 
operations. 

We now must address the question of how many operations on the variable 
must be proposed in order to accept 0((iTniax) of them. Note that if Xi is 
unexpressed, any insertion or deletion affecting Di will be accepted, since it 
cannot possibly decrease the fitness value. The probability of a substitution 
affecting Di is, by the SDP, less than or equal to the probability that an insertion 
affects Di plus the probability that a deletion affects Di. Therefore, even if every 
substitution is rejected, we still accept a constant fraction of proposed operations 
that affect Di, so we only require ©(dTmax) proposed operations involving Xi 
and Xi in order to have ©(dTmax) accepted operations. 

Once Di reaches zero, we are done and Xi is expressed unless c{xi) — c{xi) = 
0. In this case, we clearly cannot do any more deletes, but will cither add in 
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an Xi or Xi via an insertion or a substitution. Through either operation, we 
add each variable with probabihty ^ , and therefore successfully express Xi with 
probability j- In the case where we insert an Xi and increase Di to one, we 
again apply our one-dimensional random walk result with fc = 1 and n = T^ax 
to see that we will return to zero again after only 0{T^i^.^) additional moves, 
whereupon either Xi is present in the tree and we are done or we can once 
again attempt to add it. Because we expect to do this procedure only twice 
before succeeding, it only adds 0(ri„ax) steps, and therefore does not change 
our bound of 0{dT^ax)- □ 

This lemma allows us to establish an upper bound on the number of evalu- 
ations for (1+1) GP on MAJORITY given a bound on the largest deficit. 

Theorem 2. Let D = maxj Di for an instance of MAJORITY initialized with 
T terminals drawn from a set of size 2n (i.e. terminals xi, ...,Xn,xi, ...,Xn )■ 
Then the expected optimization time of (1+1) GP-single is 

0{n log n + DT^^^n log log n) 

in the worst case. 

Proof. We draw upon a result from Myers and Wilf [8] about a generalized form 
of the coupon collector problem. If we have n coupons and wish to acquire at 
least k of each coupon, we need to draw, in expectation, O(nlogn-l-fcnloglogn) 
coupons. When k is at least logn, this is a slight improvement over the naive 
bound of O(fcnlogn) from simply iterating the basic coupon collector problem 
k times. 

Lemma [2] tells us two things. Firstly, we have that a proposed operation 
involves Xi or Xi with probability il(-), so we have a coupon collector problem 
with slightly perturbed coupon probabilities. Secondly, we find that we need to 
propose 0{DTyniix) operations involving each terminal in order to express all of 
our variables. Plugging these into the bound described above yields an asymp- 
totic requirement of 0(nlogn + DT^i^^n log logn) fitness function evaluations, 
as desired. 

The only wrinkle in this picture is that the coupon collector assumes that a 
variable is "complete" after a set number of coupons have been collected. While 
we do not accept moves that reduce the fitness value, an expressed variable xt 
could become unexpressed if, during the course of a substitution operation, 
another variable Xj were simultaneously expressed. However, in this case, we 
must have had Di — and Dj = 1, and we have merely reversed the two, which 
amounts to a relabeling of the Xi and Xj. Because the Di are the only state 
variables that we care about in this case, this move effectively does nothing 
except cause us to make a vacuous move. Because substitutions only make 
up I of all of the proposed moves, such wasted moves can only make up a 
constant fraction of the total number of moves, and therefore do not change the 
asymptotics. D 

As a corollary of [2] we can bound the initial D by considering tree initializa- 
tion. 
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Corollary 2. When MAJORITY is initialized with m = 0{n) terminals drawn 
from a set oj size 2n, the expected optimization time of (1+1) GP-single is 

©(n^Tmaxloglogn) 

Proof. This follows from Theorem [2] with D = m, since the deficit cannot be 
greater than the number of terminals in the tree. D 

We can consider the outcome of the worst case tree initialization both in- 
tuitively and experimentally. We have D = m when all of the leaves consist of 
instances of one bar variable, say xi. Since the xi occupy such a large fraction 
of the tree, they will frequently be substituted out or deleted. This suggests that 
the balanced random walk argument is quite pessimistic given this circumstance. 
We thus expect that, in practice, this initial condition will be quickly erased. If 
we put Tmax = 0{n), we know, from the coupon collector problem, that after an 
initial phase of 0(nlog7i) steps, we will have proposed a deletion on every leaf 
that was initialized in the GP tree. Because deletions are always accepted on 
negated variables, we will have deleted all of the initial xi variables by the end 
of this "erasure" phase, and only expect to introduce at most O(logn) of any 
particular bar variable through insertions and substitutions. This implies that, 
after this relatively short phase, D = O(logn), giving an optimization time of 
0(n^ log n log log n), a bound very close to the average-case optimization time 
we present in |4.2.1| 

We experimented with this initialization to confirm our intuition. Figure [2] 
shows the results of solving MAJORITY using (1-fl) GP-single with increasing 
problem size and trees initialized with 2n leaves, each occupied by xi. We 
tracked the number of fitness evaluations required and, even though we imposed 
no bound on the tree size, the order of growth relative to n appears to be just 
barely superlinear. This empirical evidence supports the intuition that the 
worst-case performance is much closer to the average-case than Corollary [2] 
would suggest. 




Figure 2: Plot of the average optimization time given a "bad" initialization of 
size 2n with Di = 2n for xi. Fifty trials were used to compute each point. 
Circles indicate the mean number of fitness function evaluations for each value 
of n, and error bars show the standard deviation of the 50 trials. 
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4.1.2 (1+1) GP* 

Unlike in the case of the ORDER problem, where accepting or not accepting 
neutral moves makes no difference in the performance of the algorithm, for 
MAJORITY, such a distinction matters tremendously. Intuitively, this behavior 
arises because there is a notion of "working towards" a solution here that is 
absent from the ORDER problem. In ORDER, our analysis relied on our ability 
to express an Xi by simply inserting it as a terminal early enough in the tree, 
which required only one step. However, in MAJORITY, if there are k Xi and ^ 
Xi present in the GP tree, at least [^^] mutation operations will be required 
to make up this deficit, all but the last of which will be neutral moves. 

Because of the importance of neutral moves, we find that (1+1) GP*-single 
and (1+1) GP*-multi perform quite badly. Even when we initialize with a tree 
with size linear in n, the number of terminal symbols, we can demonstrate an 
initialization where (1+1) GP*-single never terminates and (1+1) GP*-multi 
takes an exponential amount of time to do so. Consider the tree Tiopt which has 
as leaves the variables 



n+1 of these 



Theorem 3. Let Tiopt be the initial solution to MAJORITY. Then the expected 
optimization time of (1+1) GP*-single is infinite. 

Proof. It is clear that, with one move, the deficit in x„ can only be changed 
by at most two. There is a deficit of n + 1 to make up, which is impossible, 
therefore (1+1) GP*-single will never find its way out of this local optimum. D 

Theorem 4. Let Tiopt be the current solution to MAJORITY. Then the expected 
optimization time of (1+1) GP*-multi is at least exponential in n. 

Proof. The fitness value oi Tiopt is n — 1, with xi through Xn-i expressed, so the 
only way to improve the fitness is to make a move that expresses x„ . Therefore, 
the moves that achieve this are the only moves that will be accepted. We 
compute the probability of making such a move in this configuration in order 
to determine the expected time to make such a move. 

Note that any mutation operation that successfully improves the fitness must 
make up for a deficit of n+1, which requires at least [-^^^ | operations, assuming 
that we, in each case, substitute an x„ with an x„. The number of moves per 
mutation is distributed as l + Pois(l), so the Poisson random variable must take 
a value of at least [^^^] . The probability of this when A = 1 is given by 



oo 



^"<.^ = oto 



^^ il - (^)\ \\2eJ 



by Stirling's formula. 
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We can take this probability as a (very weak) upper bound on the probabihty 
of improving the fitness. Inverting it, we see that the expected number of moves 
required is fi('(|^)^y D 

4.2 Average Case Bounds 

To provide average case bounds we consider a GP tree which is initialized with 
what we term "unity expectation" : it has 2n terminals (leaves) each selected 
uniformly at random from the set of possible terminals. 

4.2.1 (1+1) GP 

The average case bound follows more or less directly from Theorem [2] once a 
result from the literature is applied to given an expected bound on the maximum 
initial deficit. 

Corollary 3. For MAJORITY with a terminal set of size 2n under unity ex- 
pectation initialization, the expected optimization time of (1+1) GP-single is 

{nT^s.^ log n) . 

Proof. A result from Raab and Steger [2] tells us that, with probability at least 

1 — O (^) for any integer fc, no Xi appears more than O i ^^ °f^ ^ J times in the 

GP tree, so I? = O I ^^ °^^l ^ j . Set fc = 2, so that the probability of having a 

larger deviation is O (;^). The worst-case bound of O (n^Tmaxloglogri) from 
Corollary [2]ensures that these uncommon cases contribute only an 0(Tniax log log 
term to the expectation. Substituting D ^ O [ , °PJ ) into the expression in 



n] 



^ log log n ^ 

Theorem [2] gives us the desired bound for the common case, which is also the 
overall runtime bound. D 

4.2.2 (1+1) GP* 

Assuming unity expectation initialization, we can improve on our result from 



4.1.2 and show (1+1) GP*-singlc has a constant probability of failing to termi- 
nate. Our general strategy will be to prove that there is constant probability 
that, when starting with a deficit of size three in a;i, this deficit will be pre- 
served until the fitness is n — 1. At this point, when all the other variables are 
expressed, there will remain a gap that cannot be closed in a single step. Such 
a deficit could disappear over the course of the algorithm because substitution 
has the ability to shrink the deficit (by removing xi and replacing it with a Xi 
in order to express that variable) , but this proof shows that there is nonetheless 
a constant probability of the deficit being preserved. 

First, we establish a lemma about the prevalence of constant-size deficits 
arising based on our initialization. 

Lemma 3. Suppose we have a 2n-length instance of the MAJORITY problem 
with unity expectation initialization. Let Ak denote the event that xi appears 
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exactly k times without x\ appearing at all, where k is any constant. Then 
Pr{Ak) = n{l). 

Proof. To compute Pr(j4fe), we count the number of 2n-length sequences of 
terminals for which this is true and divide by the total number of possible 
sequences. Under Af^, we must have k instances of xi and zero instances of xi, 
so there are ( 7*) positions that can be occupied by the Xi and the remaining 
2n — k positions should each be occupied by one of the 2n — 2 elements that 
are not xi or xi. In total, there are (2n)'^" possible 2n-length sequences of 
terminals. Combining these facts yields 

, ^ , P.")(2n-2)2"-'= 
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= 0(1) 
assuming that fc is a constant. D 

Next, we lower-bound the size of the GP tree when running (1+1) GP*- 
single on MAJORITY. The tree must be large enough so that we are not too 
likely to substitute out the xi over the course of the algorithm. 

Lemma 4. Using (1+1) GP-single on MAJORITY with any initialization of 
size 2n, the size of the GP tree is always greater than -^ with probability one. 

Proof. A deletion can only improve the fitness if we delete some Xi when c{xi) = 
n and c{xi) = n + 1, with n positive. Such a configuration requires at least 
three occurrences of Xi and Xi in the GP tree, so at most ^ variables can 
be present in this fashion initially. Of the at least |^ variables that remain, 
at most half can be expressed by a deletion, because they must be first put 
into this configuration during the course of a substitution that expresses some 
other variable Xj. Therefore, we are forced to accept at least ^ insertions or 
substitutions over the course of the algorithm, giving us an upper bound of ^ 
on the number of deletions accepted. This in turn guarantees that our tree 
always remains larger than 2n— ^ — ^. D 

Finally, we can prove the claim directly. 

Theorem 5. With probability 0(1), the optimization time of (1+1) GP*-single 
with unity expectation initialization is infinite on MAJORITY. 
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Proof. Lemma [3] tells us that, with a constant probability, we initialize one of 
the variables, say xi, with c(xi) — and c{xi) = 3. We now show that, also 
with a constant probability, such a deficit is preserved during the course of the 
expression of at most n — 1 of the other variables. 

We make such an argument by induction. Define the jth step of the algo- 
rithm as the period after j variables have been expressed, at the end of which 
we propose the move that expresses the j + 1st move. Suppose that at the jth 
step, it is true that c(xi) = and c{xi) = 3. The j + 1st variable expressed 
cannot possibly be xi , since there is no way to make up a deficit of three with a 
single move. If the move we accept to express the j + 1st variable is an insertion 
or a deletion, we preserve our deficit of three and do not change the state of the 
variable xi at all, since we must either insert some variable in the set {x2, ■■■, Xn} 
or delete some variable in the set {x2, ...,x„}. 

If we express the j + 1st variable with a substitution, however, it is possible 
that we might insert an Xi or delete one of the Xi. An accepted substitution must 
either replace some variable with a variable in the set {x2, ...,a;„} or substitute 
out some {x2, ■■■,x„}. However, a substitution also involves an "extraneous" 
insertion or deletion, by the SDP. If this operation impacts a variable different 
than the j + 1st variable we are expressing, it must be an operation that, on its 
own, would keep the fitness constant. For an extraneous insertion, we note that 
it is always admissible to insert any of the n symbols in the set {xi, X2, ■■■, Xn} 
without decreasing the fitness. Therefore, the probability of inserting neither 
x-i nor x-i in a neutral or better move is at least 1 — -. 

If the extraneous operation is a deletion, we note that it is always possible 
to delete at least ^^^=^ terminals, where T is the current tree size. Any variable 
expressed with c{xi) = 1 and c{xi) = cannot be removed, so there might be 
as many as n terminals forbidden for this reason. For any variable not in this 
configuration, we have one of two cases. If the variable is unexpressed or is 
expressed with a deficit less than or equal to -1, any occurrence of Xi or Xi can 
safely be deleted without decreasing the fitness. If the variable is expressed with 
a deficit of zero, there must be at least as many Xi as there are Xj, and any of 
these Xi can be safely deleted. Therefore, we set aside at most n "singleton" 
symbols that cannot be deleted, and of those remaining, it must always be 
acceptable to delete at least half, yielding -^-f^. 

We therefore preserve our three xi variables with probability at least 

^--3 ^-^_ 6 



T-n " rp 

We now invoke the result from Lemma HI Because the size of the tree is at 
least ^ at all times, we can lower-bound the probability of preserving the Xi 
asl-f. ^ ^ 

These situations (extraneous inserts and extraneous deletes) are mutually 
exclusive, and of the two, the deletes are the more probable to interfere with 
our xi setup. Nevertheless, the probability of preserving our deficit of three in 
xi from the jth step to the j + 1st step is at least 1 —. Because there are 



18 



at most n — 1 such steps of the algorithm, our overall probability of preserving 
the deficit is 

We have constant probability of initializing with such a deficit, and a constant 
probability of preserving the deficit, in which case the algorithm never termi- 
nates. Therefore, with constant probability, (1+1) GP*-single never terminates 
on MAJORITY. D 

Corollary 4. Using unity expectation initialization, the expected optimization 
time of (1+1) GP*-single on MAJORITY is infinite. 

Proof. This follows directly from Theorem [5] D 

While Theorem [5] does demonstrate that the probability of getting stuck in 
a local optimum is at least a constant, the actual constant yielded by the proof 
is rather small. However, our proof technique made several very conservative 
assumptions for simplicity. To investigate further, we tried to solve MAJORITY 
with (1+1) GP*-single experimentally, observing when we would get stuck in 
a local minimum. Experimentally, the actual probability of (1+1) GP*-single 
failing to converge to the optimum is actually quite high, as demonstrated by 
Figure [3j 



50 100 150 200 

n 

Figure 3: Plot showing the probability of (1+1) GP*-single failing to terminate 
on MAJORITY when the initial solution tree has 2n terminals each selected 
uniformly at random, i.e. with unity expectation initialization. Each probability 
was determined empirically over the course of 100 simulations for each value of 
n. 

However, we do not know how to show a similar result for (1+1) GP*-multi. 
We note that difficult MAJORITY instances, such as Tiopt presented in|4.1.2 are 



exponentially unlikely to occur when the initial solution tree has 2n terminals 
each selected uniformly at random. From Raab and Steger [14], we know that 
deficits larger than logarithmic occur with exponentially small probability, and 
in any case large deficits should tend to equalize over the course of the algorithm 
execution, even if we only accept a linear number of moves. However, if the last 

unexpressed variable has a deficit of size fc, we will require at least fi I n~2 ] 
steps to correctly substitute out enough instances of xi for xi even in the best 
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case, so unless k can be bounded at a constant, we will have an expected runtime 
that is superpolynomial. 



5 Summary and Discussion 

Table 1 aggregates our expected optimization time results for all algorithm 
variants and each problem. 





ORDER 




(1+1) GP 


(1+1) GP* 


single 


0(»^T'max) W.C. t 


0{n''') W.C. 


multi 


O(nTmax) W.C. t 


0(nr,„ax) W.C. t 






MAJORITY 




(1+1) GP 


(1+1) GP* 


single 


0(n2r,„axlogn) W.C. f 
O(nTmaxlogn) a.c. 


J7(oo) a.c. 


multi 


? 


^((il)*)w.c. 



Table 1: Results of the computational complexity analysis for our sample prob- 
lem. We use W.C. to denote a worst-case bound and a.c. to denote an average-case 
bound. The daggers indicate where we conjecture that better bounds exist. 

From the perspective of a GP practitioner, the insights provided by this 
rigorous analysis may be more valuable than the complexity results themselves. 
In |5.1| we discuss how our treatment sheds light upon the important but sub- 
tle interactions between a problem, the acceptance criterion, and the genetic 
operator. In 5.2 we discuss the impact of the sub-operations on the muta- 
tion operator we considered. In |5.3[ we address the implications of our design 
and analysis methodology for practical GP algorithm design. Section [5^ covers 
some of our analysis techniques, and finally |5.5| presents future work avenues 
and concludes. 



5.1 Accepting Neutral Moves in ORDER and MAJOR- 
ITY 

It might initially seem immaterial whether or not we accept neutral moves with 
our genetic operator for ORDER and MAJORITY. However, our analysis pro- 
vides rigorous evidence that the differences in performance between (1+1) GP 
and (1+1) GP* are substantial for both of these problems. Similar results have 
already been obtained in the context of evolutionary algorithms for binary rep- 
resentations ^. 

order's focus on condition semantics gives it the property that only the 
first occurrence of each terminal matters. A large tree makes the probability of 
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improvements smaller because many of the mutations will change variables that 
have no effect on expression, being sequentially later than earlier occurrences 
of those same variables. Therefore, not accepting neutral moves helps prevent 
"bloat" and using (1+1) GP* is significantly advantageous. (1+1) GP's accep- 
tance of neutral moves causes a feedback loop that stimulates growth of the 
tree: there is a slight bias towards accepting insertions as opposed to deletions, 
which makes the tree large, which increases the time to find an improvement 
and results in many neutral insertions, which increase the tree size even more. 
In general, to solve ORDER with runtime performance that respects the com- 
plexity analysis, the tree must not grow too large, and not accepting neutral 
moves assures this. 

Solving MAJORITY, we see the opposite effect: (1+1) GP very handily 
beats (1+1) GP* in terms of expected optimization time. Neutral moves have 
the effect of balancing both the relative frequency of variables and the number 
of positive versus negative occurrences. This draws us toward a very favor- 
able average case where every variable is either expressed or very close to being 
expressed. If neutral moves are not accepted, improvement can frequently stag- 
nate, underscoring the fact that there are large flat regions in the search space. 
(1+1) GP is better equipped to escape these than (1+1) GP*, so one should, in 
fact, clearly choose (1+1) GP so that there is a guarantee of termination and 
to avoid the exponential-time worst-case associated with (1+1) GP*-multi. 

Overall, these results highlight the fact that, in choosing whether or not to 
accept neutral moves, one should consider their general effect with respect to 
both the fitness landscape and growth in tree size. Tying this knowledge into 
expected optimization time also requires an understanding of the mechanisms 
by which the fitness increases. We recognize that for ORDER and MAJORITY, 
this is much easier to do rigorously than for more realistic problems. However, 
perhaps our exercise with ORDER and MAJORITY can provide intuitive in- 
sight to GP practitioners. 

5.2 Mutation 

Our results also tell us more about our HVL-Mutate' framework, and show that 
it has several interesting properties and behaves quite differently for the two 
problems. 

Interestingly, the analysis for ORDER, which uses the fitness partition method, 
only relies on the use of insert. However, we could not run the algorithm with 
only insertion, because the tree would get very large and the expected time to 
termination would actually become infinite, in the absence of a strict bound on 
the tree size. Therefore, deletions are necessary to control the size of the tree, 
if nothing else. We could, however, envision designing an alternative operator 
without substitution, whose insertion and deletion probabilities are imbalanced. 
By choosing these probabilities appropriately, we could prevent the tree size 
from getting too large (without explicitly bounding it) while still doing as many 
insertions as possible, thus allowing the algorithm to reach the optimum with 
the lowest number of fitness evaluations. 
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For MAJORITY, the substitution decomposability property indicates that, 
for this particular problem, the substitution operation is more complex than 
insertion and deletion and is, in fact, a macro operator which is a combination 
of the two. A superficial glance at the operator does not necessarily reveal 
this; in fact, it is tempting to believe that substitution is generally the least 
complex of the operators, because it most closely resembles a bit-flip in a fixed- 
length representation and does not change the size or structure of the GP tree 
at all. However, it could be beneficial to dispense with substitution altogether; 
this would simplify some of the analysis and achieve the goal of making our 
mutation operator as "local" as possible for the given problem. 

Locality is a property that depends on fitness landscape and operator. Here 
we see the interaction explicitly and reflect upon the influence of the fitness 
landscape, which itself depends upon the genotype to phenotype mapping. A 
substitution makes the same genotypic change because MAJORITY and OR- 
DER share the same primitive set. But in MAJORITY, to a first order ap- 
proximation, the amortized average change in fitness is larger than in ORDER 
because ORDER'S expression mechanism places emphasis on only the front of 
the parsed leaf list whereas MAJORITY'S depends on the entire set of leaves. 

Both problems also reveal a fundamental asymmetry between deletions and 
insertions. Insertions select uniformly from the set of possible terminals, so each 
terminal is affected with the same probability, but deletions select uniformly 
from the set of leaves, so the probability of the operation changing a particular 
terminal depends on the concentration of that terminal in the tree. In the 
case of ORDER, this has ramifications for the evolution of the tree size over 
time, because insertions end up being less likely to decrease the fitness than 
deletions, so the tree grows over time. For MAJORITY, this phenomenon has 
a positive effect, rather than a negative one: if there are more occurrences of a 
negated variable than its corresponding positive variable, we will tend to remove 
those negations with higher probability, and simultaneously balance the relative 
concentration of each variable. 

5.3 Informing GP Practice 

This analysis also prompts one to revisit and review assumptions about the 
necessity of a population and a crossover operator in GP. It does not imply that 
they are unnecessary but it explicitly shows, at least, simple circumstances when 
they are not. This may advise GP practitioners to assure themselves empirically 
that a population and a crossover operator are needed (alone or together) when 
they start their algorithm development. These algorithms and operators are 
simple and easy to code, yielding a quick solution to the problem while at the 
same time supporting parallelism and featuring similar efficiency to conventional 
GP. Even though the problem at hand is not likely to have the simple problem 
structure observed here and even though it may require a more sophisticated 
operator, starting from a provably correct algorithm provides a platform for 
rationally exploring how to address the separate challenges presented by a harder 
problem or designing an operator specifically for the problem. 
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Additionally, this analysis contrasts with conventional GP design practice. 
Conventionally, GP design proceeds in a very practical manner, but one which is 
antithetical to theoretical algorithm designers. Rather than derive an algorithm 
that is provably correct and of efficient complexity, practitioners use biological 
inspiration, empirical insight and current GP theory. This current theory tries 
to provide transparent explanations of how GP bloats, how it constructs solu- 
tions from schema and how it navigates a fitness landscape with its operators 
and selection. The resulting heuristic can be expected to generate initial mixed 
results and require subsequent trial and error to "perfect" its use on the problem 
of interest. There exist some best practices and rules of thumb for robust algo- 
rithms but little useful guidance on algorithm customization (via, e.g. genetic 
operators) for this subsequent design phase. The process finally yields a heuris- 
tic which, though a randomized algorithm, is intractable to analyze post-hoc 
for correctness or efficiency. One can offer to a user its computational expense, 
which is the product of population size, generations, and number of runs, as 
well as some empirical estimate of the likelihood of finding a sufficiently opti- 
mal solution on a future problem instance. An open question is whether the 
algorithm design methodology taken in this contribution, that of algorithmic 
theoreticians, could be blended with or complement the method of practice. 
Our methodology yields a fundamentally different new form of theoretical re- 
sult for GP: a randomized algorithm of established computational efficiency 
(which is different from computational expense), that is guaranteed to find a 
solution. However, our analysis is tractable only because the fitness structure of 
the model problems is simple, and the algorithms use only a simple hierarchical 
variable length mutation operator. It is an open question as to whether the first 
pass application of the simple algorithms and operators on a realistic problem 
might prove useful for insight or well founded design choices. Forums such as 
the annual Genetic Programming: From Theory to Practice workshop which 
encourage explicit interactions among theoreticians and practitioners may en- 
courage the investigation of this question and provide a means of collecting the 
experiences. 

5.4 Analysis Techniques 

We also comment briefly on our analysis techniques. The analysis of ORDER 
used the method of fitness partitions [2] , a very general method that has found 
many applications in the complexity analysis of EAs for binary representation. 
The method was successful because, in ORDER, we are always only one move 
away from expressing a particular variable. However, MAJORITY required 
different analysis techniques because, unlike in ORDER, there is not an easily 
computable probability of improving the fitness given only the fitness value; 
there is also a crucial dependence on the neutral moves we make. The coupon 
collector and random walk method considered optimizing all of the variables 
jointly, and in doing so achieved a bound on performance very close to the theo- 
retical lower bound. This indicates that perhaps more complex fitness functions 
that are not separable might be admissible to similar styles of analysis. 
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5.5 Future Work 

We see three main directions for future work in the computational complexity 
analysis of genetic programming. Obviously the goal of bridging a gap from what 
exists to practice is daunting. However, modest steps forward may be revealing. 
The first extension is to increase the complexity of the genetic operators that are 
acting on these two problems. Our 1 + 1 operators are essentially just stochastic 
hill climbers, and while understanding of such an optimization technique is 
valuable in and of itself, real-world GP implementations clearly involve more 
individuals. From GA theory, there is a precedent of taking 1 + 1 analysis of a 
problem and extending it to /x + 1 analysis, where fi is the size of the population 
(see e.g. dH). This would admit tree-based crossover operators (if they can be 
shown to be necessary for an efficient optimization time). 

The other extension is to consider harder problems. While ORDER and 
MAJORITY each capture a couple very simple properties of a program, both 
rely upon inspection and neither problem's fitness function takes into account 
the hierarchical nature of a GP tree, which is of crucial importance for all 
practical applications of GP. We could extend these problems in several ways to 
address the latter issue. One could keep the same terminal set and join nodes, 
but make the fitness function take subtrees into account somehow. Alternatively, 
one could introduce a new type of join operation and use the fitness function to 
impose a constraint that forces us to optimize the higher levels of the tree as well, 
perhaps by giving higher fitness to individuals when this "join prime" has regular 
joins as its children and vice versa. Either of these changes would increase the 
interest of the problem and make the results more relevant to the way that GP 
is used in practice. However, any new problem in this form may require new 
or modified mutation operators to be admissible to analysis. We could also 
try to extend the difficulty of MAJORITY by designing a new objective that 
requires the correct material (and no incorrect material) to be in the right order, 
in addition to being present. A model problem which abstracts the semantics 
of iteration might also provide insights, if tractable. Whether the currently 
used proof techniques-fitness partitioning, random walks and use of the coupon 
collector-are sufficient to address more challenging setups is an open question. 
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